Overview
Brought to you by YData
Dataset statistics
| Number of variables | 3 |
|---|---|
| Number of observations | 27299925 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.9 GiB |
| Average record size in memory | 74.0 B |
Variable types
| Numeric | 2 |
|---|---|
| Categorical | 1 |
MONTHS_BALANCE has 610965 (2.2%) zeros | Zeros |
Reproduction
| Analysis started | 2025-02-02 08:55:31.229820 |
|---|---|
| Analysis finished | 2025-02-02 08:58:33.123662 |
| Duration | 3 minutes and 1.89 second |
| Software version | ydata-profiling vv4.12.2 |
| Download configuration | config.json |
Variables
SK_ID_BUREAU
Real number (ℝ)
| Distinct | 817395 |
|---|---|
| Distinct (%) | 3.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6036297.3 |
| Minimum | 5001709 |
|---|---|
| Maximum | 6842888 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 208.3 MiB |
Quantile statistics
| Minimum | 5001709 |
|---|---|
| 5-th percentile | 5113173 |
| Q1 | 5730933 |
| median | 6070821 |
| Q3 | 6431951 |
| 95-th percentile | 6759761 |
| Maximum | 6842888 |
| Range | 1841179 |
| Interquartile range (IQR) | 701018 |
Descriptive statistics
| Standard deviation | 492348.86 |
|---|---|
| Coefficient of variation (CV) | 0.081564713 |
| Kurtosis | -0.73796627 |
| Mean | 6036297.3 |
| Median Absolute Deviation (MAD) | 353720 |
| Skewness | -0.37218781 |
| Sum | 1.6479046 × 1014 |
| Variance | 2.424074 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5645521 | 97 | < 0.1% |
| 6733619 | 97 | < 0.1% |
| 6176606 | 97 | < 0.1% |
| 6321834 | 97 | < 0.1% |
| 6356432 | 97 | < 0.1% |
| 6356400 | 97 | < 0.1% |
| 6243196 | 97 | < 0.1% |
| 6356352 | 97 | < 0.1% |
| 6356351 | 97 | < 0.1% |
| 6765607 | 97 | < 0.1% |
| Other values (817385) | 27298955 |
| Value | Count | Frequency (%) |
| 5001709 | 97 | |
| 5001710 | 83 | |
| 5001711 | 4 | < 0.1% |
| 5001712 | 19 | < 0.1% |
| 5001713 | 22 | < 0.1% |
| 5001714 | 15 | < 0.1% |
| 5001715 | 60 | |
| 5001716 | 86 | |
| 5001717 | 22 | < 0.1% |
| 5001718 | 39 |
| Value | Count | Frequency (%) |
| 6842888 | 62 | |
| 6842887 | 37 | |
| 6842886 | 33 | |
| 6842885 | 24 | < 0.1% |
| 6842884 | 48 | |
| 6842883 | 37 | |
| 6842882 | 8 | < 0.1% |
| 6842881 | 32 | |
| 6842880 | 58 | |
| 6842879 | 39 |
MONTHS_BALANCE
Real number (ℝ)
Zeros 
| Distinct | 97 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -30.741687 |
| Minimum | -96 |
|---|---|
| Maximum | 0 |
| Zeros | 610965 |
| Zeros (%) | 2.2% |
| Negative | 26688960 |
| Negative (%) | 97.8% |
| Memory size | 208.3 MiB |
Quantile statistics
| Minimum | -96 |
|---|---|
| 5-th percentile | -79 |
| Q1 | -46 |
| median | -25 |
| Q3 | -11 |
| 95-th percentile | -2 |
| Maximum | 0 |
| Range | 96 |
| Interquartile range (IQR) | 35 |
Descriptive statistics
| Standard deviation | 23.864509 |
|---|---|
| Coefficient of variation (CV) | -0.77629147 |
| Kurtosis | -0.31614292 |
| Mean | -30.741687 |
| Median Absolute Deviation (MAD) | 16 |
| Skewness | -0.76068962 |
| Sum | -8.3924574 × 108 |
| Variance | 569.51479 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -1 | 622601 | 2.3% |
| -2 | 619243 | 2.3% |
| -3 | 615080 | 2.3% |
| 0 | 610965 | 2.2% |
| -4 | 609138 | 2.2% |
| -5 | 602663 | 2.2% |
| -6 | 594277 | 2.2% |
| -7 | 583794 | 2.1% |
| -8 | 573566 | 2.1% |
| -9 | 563804 | 2.1% |
| Other values (87) | 21304794 |
| Value | Count | Frequency (%) |
| -96 | 43147 | |
| -95 | 46542 | |
| -94 | 49965 | |
| -93 | 53535 | |
| -92 | 57300 | |
| -91 | 61144 | |
| -90 | 65188 | |
| -89 | 69383 | |
| -88 | 73452 | |
| -87 | 77586 |
| Value | Count | Frequency (%) |
| 0 | 610965 | |
| -1 | 622601 | |
| -2 | 619243 | |
| -3 | 615080 | |
| -4 | 609138 | |
| -5 | 602663 | |
| -6 | 594277 | |
| -7 | 583794 | |
| -8 | 573566 | |
| -9 | 563804 |
STATUS
Categorical
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.5 GiB |
| C | |
|---|---|
| 0 | |
| X | |
| 1 | 242347 |
| 5 | 62406 |
| Other values (3) | 38190 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | C |
|---|---|
| 2nd row | C |
| 3rd row | C |
| 4th row | C |
| 5th row | C |
Common Values
| Value | Count | Frequency (%) |
| C | 13646993 | |
| 0 | 7499507 | |
| X | 5810482 | |
| 1 | 242347 | 0.9% |
| 5 | 62406 | 0.2% |
| 2 | 23419 | 0.1% |
| 3 | 8924 | < 0.1% |
| 4 | 5847 | < 0.1% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| c | 13646993 | |
| 0 | 7499507 | |
| x | 5810482 | |
| 1 | 242347 | 0.9% |
| 5 | 62406 | 0.2% |
| 2 | 23419 | 0.1% |
| 3 | 8924 | < 0.1% |
| 4 | 5847 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 13646993 | |
| 0 | 7499507 | |
| X | 5810482 | |
| 1 | 242347 | 0.9% |
| 5 | 62406 | 0.2% |
| 2 | 23419 | 0.1% |
| 3 | 8924 | < 0.1% |
| 4 | 5847 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 27299925 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| C | 13646993 | |
| 0 | 7499507 | |
| X | 5810482 | |
| 1 | 242347 | 0.9% |
| 5 | 62406 | 0.2% |
| 2 | 23419 | 0.1% |
| 3 | 8924 | < 0.1% |
| 4 | 5847 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 27299925 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| C | 13646993 | |
| 0 | 7499507 | |
| X | 5810482 | |
| 1 | 242347 | 0.9% |
| 5 | 62406 | 0.2% |
| 2 | 23419 | 0.1% |
| 3 | 8924 | < 0.1% |
| 4 | 5847 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 27299925 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| C | 13646993 | |
| 0 | 7499507 | |
| X | 5810482 | |
| 1 | 242347 | 0.9% |
| 5 | 62406 | 0.2% |
| 2 | 23419 | 0.1% |
| 3 | 8924 | < 0.1% |
| 4 | 5847 | < 0.1% |
Interactions
Correlations
| MONTHS_BALANCE | SK_ID_BUREAU | STATUS | |
|---|---|---|---|
| MONTHS_BALANCE | 1.000 | 0.010 | 0.046 |
| SK_ID_BUREAU | 0.010 | 1.000 | 0.009 |
| STATUS | 0.046 | 0.009 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| SK_ID_BUREAU | MONTHS_BALANCE | STATUS | |
|---|---|---|---|
| 0 | 5715448 | 0 | C |
| 1 | 5715448 | -1 | C |
| 2 | 5715448 | -2 | C |
| 3 | 5715448 | -3 | C |
| 4 | 5715448 | -4 | C |
| 5 | 5715448 | -5 | C |
| 6 | 5715448 | -6 | C |
| 7 | 5715448 | -7 | C |
| 8 | 5715448 | -8 | C |
| 9 | 5715448 | -9 | 0 |
| SK_ID_BUREAU | MONTHS_BALANCE | STATUS | |
|---|---|---|---|
| 27299915 | 5041336 | -42 | X |
| 27299916 | 5041336 | -43 | X |
| 27299917 | 5041336 | -44 | X |
| 27299918 | 5041336 | -45 | X |
| 27299919 | 5041336 | -46 | X |
| 27299920 | 5041336 | -47 | X |
| 27299921 | 5041336 | -48 | X |
| 27299922 | 5041336 | -49 | X |
| 27299923 | 5041336 | -50 | X |
| 27299924 | 5041336 | -51 | X |